Skip to content

fix(issue): Safety: crossReferenceEvidence commented out, fabricated evidence bypasses check#6168

Open
jeremymcs wants to merge 2 commits into
mainfrom
issue/5938-safety-crossreferenceevidence-commented--1778892810
Open

fix(issue): Safety: crossReferenceEvidence commented out, fabricated evidence bypasses check#6168
jeremymcs wants to merge 2 commits into
mainfrom
issue/5938-safety-crossreferenceevidence-commented--1778892810

Conversation

@jeremymcs
Copy link
Copy Markdown
Collaborator

@jeremymcs jeremymcs commented May 16, 2026

Summary

  • Swapped zero-bash evidence fallback for cross-reference mismatch warning logic and confirmed with focused safety tests passing.

Verification

  • Completed in the repository worktree before push.

Related Issue

Repo

  • gsd-build/gsd-2

Branch

  • issue/5938-safety-crossreferenceevidence-commented--1778892810

Summary by CodeRabbit

  • Refactor
    • Enhanced the command verification system to more accurately detect and report when claimed verification commands are not found in executed logs, providing improved safety notifications when discrepancies are identified.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 16, 2026

Warning

Rate limit exceeded

@jeremymcs has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 4 minutes and 49 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: bed69dc8-befb-4bbb-a92b-4d5fb9d87203

📥 Commits

Reviewing files that changed from the base of the PR and between 3647bc0 and 151f2a3.

📒 Files selected for processing (1)
  • src/resources/extensions/gsd/auto-post-unit.ts
📝 Walkthrough

Walkthrough

The PR refactors the post-unit evidence cross-reference safety harness in auto-post-unit.ts. Instead of deriving a separate bashCalls array to detect missing commands, it now filters the existing mismatches set for warning-level entries where actual is null, logging a safety warning and notifying the UI when claimed commands are not found in recorded bash calls.

Changes

Evidence cross-reference safety harness

Layer / File(s) Summary
Evidence cross-reference safety refactoring
src/resources/extensions/gsd/auto-post-unit.ts
Removed the prior bashCalls array derivation and replaced it with filtering mismatches for warning-severity entries with actual === null, then logging and notifying the UI of missing claimed commands.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

  • gsd-build/gsd-2#5810: Both PRs modify the src/resources/extensions/gsd/auto-post-unit.ts safety/evidence cross-reference flow by changing how claimed bash/verification evidence is matched to recorded evidence.
  • gsd-build/gsd-2#6074: Both PRs touch the GSD evidence cross-reference flow in auto-post-unit.ts by changing how mismatches for missing recorded bash calls are computed and surfaced.

Suggested labels

bug

Poem

🐰 A fuzzy hop through safety checks refined,
No more redundant lists to find,
Just filter mismatches with care,
And warn the UI: commands weren't there!
Evidence cross-reference, cleaner and wise. 🔍

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the safety fix for fabricated evidence bypass when crossReferenceEvidence is not properly used.
Linked Issues check ✅ Passed The pull request implements all key objectives: re-enables crossReferenceEvidence logic, replaces zero-bash-calls check with command-level matching, detects fabricated evidence mismatches, and logs warnings when claimed commands are missing.
Out of Scope Changes check ✅ Passed All changes in the pull request are directly related to the linked issue objective of fixing the fabricated evidence bypass by improving safety verification logic.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch issue/5938-safety-crossreferenceevidence-commented--1778892810

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 16, 2026

🔴 PR Risk Report — CRITICAL

Files changed 2
Systems affected 1
Overall risk 🔴 CRITICAL

Affected Systems

Risk System
🔴 critical Auto Engine
File Breakdown
Risk File Systems
🔴 src/resources/extensions/gsd/auto-post-unit.ts Auto Engine
src/resources/extensions/gsd/tests/evidence-cross-ref.test.ts (unclassified)

⚠️ 🔴 Critical risk — the following systems require verification before merge:

  • 🔴 Auto Engine: validate auto-mode trigger conditions and loop termination

⛔ This PR should not be merged without executing this follow-up prompt.

Ask your coding agent to verify before submitting:

Review this PR for risks in: Auto Engine. Verify:

1. validate auto-mode trigger conditions and loop termination

Before modifying any code, assess the scope of this fix:

- Identify the root cause, not just the reported symptom.
- Search the codebase for other call sites, similar patterns, or duplicated logic that may share the same bug.
- List affected tests, documentation, and any downstream consumers that depend on the current behavior.
- Flag any changes that extend beyond the immediate file or function.

Report findings first. Then propose a fix scoped to the actual root cause, and wait for confirmation before applying changes outside the originally reported location.

💡 Have a Codex subscription? Get an independent second opinion: codex review --adversarial

@jeremymcs jeremymcs force-pushed the issue/5938-safety-crossreferenceevidence-commented--1778892810 branch from 3647bc0 to 151f2a3 Compare May 18, 2026 16:09
@jeremymcs
Copy link
Copy Markdown
Collaborator Author

Started an automated PR update.

Accepted review feedback is being applied.
Progress and final results will be posted in the related review thread(s).

@jeremymcs
Copy link
Copy Markdown
Collaborator Author

Addressed in commit 151f2a3.

Responding to comment by @coderabbitai[bot]:

This is a CodeRabbit rate limit notification — no actionable code changes are required; the PR changes are informational per the walkthrough, and the rate limit resolves automatically after the timeout.

@jeremymcs
Copy link
Copy Markdown
Collaborator Author

Addressed in commit 151f2a3.

Responding to comment by @github-actions[bot]:

🔴 PR Risk Report — CRITICAL

Verified auto-mode trigger conditions and loop termination: the evidence cross-reference check fires only for execute-task units with safetyConfig.evidence_cross_reference enabled; blocking mismatches (exit-code fabrication) correctly call pauseAuto() and return "dispatched" to terminate the loop; warning-level mismatches (missing bash calls) are intentionally non-blocking; the whole block is try/catch guarded. The refactoring from a separate bashCalls derivation to filtering mismatches where severity === "warning" && actual === null is semantically identical per evidence-cross-ref.ts:60-67. No code changes required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Safety: crossReferenceEvidence commented out, fabricated evidence bypasses check

1 participant